Let's not Argue about Semantics

نویسنده

  • Johan Bos
چکیده

What’s the best way to assess the performance of a semantic component in an NLP system? Tradition in NLP evaluation tells us that comparing output against a gold standard is a good idea. To define a gold standard, one first needs to decide on the representation language, and in many cases a first-order language seems a good compromise between expressive power and efficiency. Secondly, one needs to decide how to represent the various semantic phenomena, in particular the depth of analysis of quantification, plurals, eventualities, thematic roles, scope, anaphora, presupposition, ellipsis, comparatives, superlatives, tense, aspect, and time-expressions. Hence it will be hard to come up with an annotation scheme unless one permits different level of semantic granularity. The alternative is a theory-neutral black-box type evaluation where we just look at how systems react on various inputs. For this approach, we can consider the well-known task of recognising textual entailment, or the lesser-known task of textual model checking. The disadvantage of black-box methods is that it is difficult to come up with natural data that cover specific semantic phenomena. 1. Evaluating Meaning Formal methods for the analysis of the meaning of natural language expressions have long been restricted to the ivory tower built by semanticists, logicians, and philosophers of language. It was only in exceptional cases that they made their way directly into open domain NLP tools. Recently, this situation has changed. Thanks to the development of treebanks (large collections of texts annotated with syntactic structures), robust statistical parsers trained on such treebanks, and the development of large-scale semantic lexica, we now have at our disposal systems that are able to produce formal semantic representations achieving high coverage (Schiehlen, 1999; Bos et al., 2004; Bos, 2005; Copestake et al., 2005; Delmonte, 2006; Sato et al., 2006; Moldovan et al., 2007). Now, suppose we want to evaluate the semantic component of such NLP systems. How shall we go about it? Probably the most obvious way is to look at the semantic representations that the system produces and compare that with a gold standard annotation. After all, that’s what we do when evaluating part-of-speech tagging, chunking, named entity recognition, and syntactic parsing. But what exactly should such a gold standard for meaning representations look like? What exactly constitutes an adequate semantic representation? Should it follow a particular (formal) theory of semantics, or rather take an independent stance? What semantic phenomena should it aim to cover? Posing these questions is, moreover, a timely matter. As pointed out above, wide-coverage systems that claim to have genuine semantic components are now emerging and we need an unambiguous way of evaluating these systems for the sake of measuring progess and benchmarking. The key question is whether comparing to a gold standard (the so-called “Glass-Box” method) is an effective methodology for assessing semantic adequacy. Annotating text with semantic representations, is an immense task, with many choices to make, as I will show in this paper. Alternatively, one could take “Black-Box” approaches to semantic evaluation. I will discuss two such methods in this paper: the task of recognising textual entailment, and the task of textual model checking. 2. Glass-Box Evaluation For the glass-box evaluation we need to decide on two issues. The first is a global choice and concerns the nature of the representation language. The second concerns the depth of analysis of the various semantic phenomena that one needs to consider. 2.1 Which Representation Language? In the scope of this paper, what I mean by a semantic representation is an interpretable structure, in other words, a logical form with a model-theoretic semantics. Such a representation has a logical foundation. There are many choices we can make here, among them, representation languages based on: • propositional logic; • some description logic (many choices here); • some modal logic (many choices here, too); • first-order logic (i.e. predicate logic); • higher-order logic (i.e. lambda calculus). The list above is, by and large, ordered on expressive power. An expressive language is nice to have, but often the price to pay is high. Assuming that, in the context of scalable language technology, we take “semantic analysis” not just as the task of representing meaning, but also as the task of automatic reasoning with produced meaning representations, a compromise between expressive power and practical reasoning capabilities is unavoidable. On the one end of the spectrum we have got propositional logic, a logic with very attractive complexity properties, but with very limited means to model any interesting

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Moment of Environmental Ethics the Moment of Drift?

The question of whether architectural creativity is more of an artistic or engineering nature is one with a long history but also one with no conclusive answer. The art camp would argue that technology should be treated as a means towards and end, and that technology alone cannot give meaning to our lives. The engineering camp on the other hand would argue that good problem-solving result...

متن کامل

A New Interpretation of the Semantics of "Moral Obligation" from Allame Tabatabaie's Viewpoint

The most important part in analyzing moral concepts includes those used as predicate in moral sentences covering moral concepts of valuation and obligation. Moral concepts in the field of values include those like “good” and “bad” while obligatory concepts include “ought to” and “ought no” and “duty”. Many papers have been written about “moral obligation”; however, dissociating the area of sema...

متن کامل

Performatives and the Role of Truth in Semantics

According to Austin, in uttering I hereby X in a performative we are neither asserting nor saying anything true/false-assessable about what we are doing, our Xing. Still in producing the performative utterance we can be said to say we are Xing. So, we have the production of a declarative sentence, that is perfectly meaningful and not lacking in content in any way, that is nevertheless not produ...

متن کامل

Experiments for Assessing Floating Reinstatement in Argument-based Reasoning

Various Artificial Intelligence semantics have been developed to predict when an argument can be accepted, depending on the abstract structure of its defeaters and defenders. These semantics can make conflicting predictions, as in the situation known as floating reinstatement. We argue that the debate about which semantics makes the correct prediction can be informed by the collection of experi...

متن کامل

Semantics and truth relative to a world

My concern in this paper shall be with the idea that truth is in some way relative to a possible world. There is no doubt something right about this sort of slogan, but nonetheless, most of what I have to say about it will be negative. In particular, I shall argue that relativity of truth to a world plays no substantial role in the empirical semantics of natural language as it is standardly don...

متن کامل

The Situations We Talk about LENHART

I argue in favor of associating situations (events, episodes, eventualities, etc.) with arbitrarily complex sentences, not just atomic predicates, in NL interpretation. In that respect, a Situation Semantics approach to incorporating situations into semantic representations is preferable to a Davidsonian one. However, I will further argue that beyond the notion of truth or falsity of a sentence...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008